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10 BACKGROUND OF THE INVENTION 

Cross-Referenee to Re lated Application 

This non-provisional patent application claims benefit of 
provisional patent application U.S. Serial number 60/201,035, filed 
15 May 1, 2000, now abandoned. 

Federal Funding Legend 

This invention was produced in part using funds 
obtained through a grant from the National Institute of Allergy and 
20 Infectious Disease (AI31431). Consequently, the federal 

government has certain rights in this invention. 
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FmlH of the Invention 

The present invention relates generally to the fields of 
microbiology, bacteriology and molecular biology. More 
5 specifically, the present invention relates to the molecular cloning 
and characterization of the Ehrlichia chaffeensis 28 kD outer 
membrane protein multigene family. 

10 Description of the Related Art 

Ehrlichia are small, obligatory intracellular, gram 
negative bacteria which reside in endosomes inside host cells. 
Ehrlichiae usually cause persistent infection in their natural animal 
hosts (Andrew and Norval, 1989, Breitschwerdt et aL, 1998, Dawson 

15 et aL, 1994, Dawson and Ewing, 1992, Harrus et aL, 1998, Telford et 
aL, 1996). Persistent or prolonged Ehrlichia infections in human 
hosts have also been documented (Dumler et aL, 1993, Dumler and 
Bakken, 1996, Horowitz, et aL, 1998, Roland et aL 1994). The 
persistent infection may be caused by the antigenic variation of the 

20 Ehrlichia omp-2 and p28 outer membrane protein family due to 
differential expression or recombination of the msp-2 multigene 



family (Palmer et al., 1994, Palmer et al., 1998) or the p28 
multigene family (Ohashi et al., 1998b, Reddy et al., 1998, Yu et al., 
1999b). 

The omp-2 and p28 are homologous gene families coding 
5 for outer membrane proteins. The msp-2 multigene family has been 
identified in A. marginale (Palmer et al., 1994), A. ovina (Palmer et 
al., 1998), and the human granulocytotropic ehrlichiosis agent (Ijdo 
et al.,1998, Murphy et al.,1998). The p28 multigene family has been 
found in £. canis group ehrlichiae including E. canis, E. chaffeensis, 
10 and E. muris (McBride et al., 1999a, 1999b, Ohashi et al., 1998a, 
1998b, Reddy et al., 1998, Yu et al., 1999a, 1999b). The map-1 
multigene family found in Cowdria ruminantium is more closely 
related to the p28 multigene family than to the msp-2 multigene 
family, both in sequence similarity and gene organization (Sulsona 
15 et al., 1999, van Vliet et al., 1994). The msp-2 genes are dispersed 
in the genome whereas the p28/map-l genes are located in a single 
locus. 

To elucidate the mechanism of the host immune 
avoidance involving the multigene family, the critical questions that 
20 remain to be answered are how many genes are present in each 
multigene family and which genes are silent or active. E. chaffeensis 
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is the pathogen of an emerging disease, human monocytotropic 
ehrlichiosis. Recent studies have found seven homologous 
polymorphic p28 genes in E. chaff eensis which encode proteins from 
28 to 30-kDa (Ohashi et al., 1998b, Reddy et al., 1998). The seven 

5 sequenced p28 genes were located in three loci of the E. chaffeensis 
genome. The first locus, omp-1 contained six p28 genes. One gene 
was partially sequenced (ompl-a) and five genes were completely 
sequenced (omp-lb, -lc, -Id, -le, and -If) (Ohashi et al., 1998b). 
The second locus contained a single p28 gene (Ohashi et al., 1998b, 

10 Yuet al., 1999b). The third locus contained five p28 genes (ORF 1 
to 5). The first four open reading frames overlapped with the DNA 
sequences from omp-1 c to omp-lf and the fifth open reading frame 
overlapped with the single gene in the second locus. Therefore, the 
three loci could be assembled into a single locus (Reddy et al., 

15 1998). 

The prior art is deficient in the lack of the knowledge of 
many of the sequences of the genes in the p28 multigene family of E 
chaffeensis. The present invention fulfills this long-standing need 
and desire in the art. 

20 
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SUMMARY OF THE INVENTION 



The 28-kDa outer membrane proteins (P28) of Ehrlichia 
chaffeensis are encoded by a multigene family. The p28 multigene 
5 family of E. chaffeensis is located in a single locus, which is easy to 
sequence by genome walking. The purpose of present study was to 
determine all the p28 gene sequences and their transcriptional 
activities. There were 21 members of the p28 multigene family 
located in a 23-kb DNA fragment in the E. chaffeensis genome. The 
10 p28 genes were 816 to 903 nucleotides in size and were separated 
by intergenic spaces of 10 to 605 nucleotides. All the genes were 
complete and were predicted to have signal sequences. The 
molecular masses of the mature proteins were predicted to be 28- 
to 32-kDa. The amino acid sequence identity of the P28 proteins 
15 was 20-83%. Ten p28 genes were investigated for transcriptional 
activity by using RT-PCR amplification of mRNA. Six of 10 tested p2 8 
genes were actively transcribed in cell culture grown E. chaffeensis. 
RT-PCR also indicated that each of the p28 genes was monocistronic. 
These results suggest that the p28 genes are active genes and encode 
20 polymorphic forms of the P28 proteins. In addition, the P28s were 
divergent among separate isolates of E. chaffeensis. The large 



repertoire of the p28 genes in a single ehrlichial organism and 
antigenic diversity of the P28 among the isolates of E. chaffeensis 
suggest that P28s may be involved in immune avoidance. 

The present invention describes the molecular cloning, 
5 sequencing, characterization, and expression of the multigene locus 
of P28 from Ehrlichia chaffeensis. The present invention describes a 
number of newly described genes for P28 proteins including 
proteins having amino acid sequences selected from the group 
consisting of SEQ ID No. 1, SEQ ID No.2, SEQ ID No. 3, SEQ ID No. 4, 

10 SEQ ID No. 5, SEQ ID No. 6, SEQ ID No. 7, SEQ ID No. 8, SEQ ID No. 9, 
SEQ ID No. 10, SEQ ID No 11, SEQ ID No. 12, SEQ ID No. 13, SEQ ID 
No. 20 and SEQ ID No. 21. These P28 genes are contained in a single 
23 kb multigene locus of Ehrlichia chaffeensis. The novel part of 
this locus are described in GenBank accession number AF230642 

15 and GenBank accession number AF230643. 

The instant invention is also directed to DNA encoding a 
P28 protein selected from those described above. This DNA may 
consist of isolated DNA that encodes a P28 protein; isolated DNA 
which hybridizes to DNA encoding an isolated P28 gene, and isolated 

20 DNA encoding a P28 protein which differs due to the degeneracy of 
the genetic code. 



The instant invention is also directed to a vector 
comprising a P28 gene and regulatory elements necessary for 
expression of the DNA in a cell. This vector may be used to 
transfect a host cell selected from group consisting of bacterial 
5 cells, mammalian cells, plant cells and insect cells. E. coli is an 
example of a bacterial cell into which the vector may be transfected. 

The instant invention is also directed to an isolated and 
purified Ehrlichia chaff eensis P28 surface protein selected from 
those described above including those with amino acid sequences 
q 10 SEQ ID No. 1, SEQ ID No.2, SEQ ID No. 3, SEQ ID No. 4, SEQ ID No. 5 , 
% SEQ ID No. 6, SEQ ID No. 7, SEQ ID No. 8, SEQ ID No. 9, SEQ ID No. 

B 10, SEQ ID No 11, SEQ ID No. 12, SEQ ID No. 13, SEQ ID No. 20 and 

fjj SEQ ID No. 21. 

1% The instant invention also describes an antibody directed 

a 

H 15 against one of these P28 proteins. This antibody may be a 
monoclonal antibody. 

The novel P28 proteins of the instant invention may be 
used in a vaccine against Ehrlichia chaffeensis. 

Other and further aspects, features, and advantages of 
20 the present invention will be apparent from the following 
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description of the presently preferred embodiments of the invention 
given for the purpose of disclosure. 

5 

BRIEF DESCRIPTION OF THE DRAWINGS 

So that the matter in which the above-recited features, 
advantages and objects of the invention, as well as others which will 

10 become clear, are attained and can be understood in detail, more 
particular descriptions of the invention briefly summarized above 
may be had by reference to certain embodiments thereof which are 
illustrated in the appended drawings. These drawings form a part of 
the specification. It is to be noted, however, that the appended 

15 drawings illustrate preferred embodiments of the invention and 
therefore are not to be considered limiting in their scope. 

Figure 1 shows the scheme of sequencing the p28 gene 
locus by genome walking and the organization of the p28 genes. 
Three loci of p28 genes previously sequenced were aligned and 

20 assembled into a single contiguous sequence. Initial primers (arrow 
heads) were designed near the 5' and 3' ends of the contiguous 
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sequence to walk the genome. The block arrows represented the 
positions and the directions of the p28 genes. The scale indicated 
the nucleotides in kilobases. 

Figure 2 shows a clustal alignment of the amino acid 
5 sequences of the E. chaffeensis Arkansas strain P28s (1-21). P28-1 
was used as consensus sequence. Dots represented residues 
identical to those of the consensus sequence. Gaps represented by 
dash lines were introduced for optimal alignment of the DNA 
sequences. The hypervariable regions were underlined. 
10 Figure 3 shows the phylogenetic relationships of the 

P28s (1-21). The number on the branch indicated the bootstrap 
values. 

Figure 4 shows Southern blotting. Two bands of 17.6 
and 5.3 kb were detected by a p28 gene probe on Cla I restriction 
15 endonuclease digested E. chaffeensis genomic DNA (lane E). M: 
molecular weight marker. 

Figure 5 shows RT-PCR amplification of the mRNA of £ 
chaffeensis p28 genes (RT-PCR). In the PCR controls, reverse 
transcriptase was omitted. The numbers of each lane indicated the 
20 p28 genes. M represents a molecular weight marker. 
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DETAILED DESCRIPTION OF THE INVENTION 



The following abbreviations may be used herein: 
BCIP/NBT-5-bromo-4-chloro-3-indolylphosphate/ 
nitrobluetetrazolium substrate; ATP - adenosine triphosphate; DNA 
- deoxyribonucleic acid; E - Ehrlichia; kDa - kilodalton; mRNA - 
messenger ribonucleic acid; ORF - open reading frame; P28 - 28- 
kDa outer membrane proteins; PCR - polymerase chain reaction; RT- 
PCR - reverse transcriptase-polymerase chain reaction. 

In accordance with the present invention there may be 
employed conventional molecular biology, microbiology, and 
recombinant DNA techniques within the skill of the art. Such 
techniques are explained fully in the literature. See, e.g., Maniatis, 
Fritsch & Sambrook, "Molecular Cloning: A Laboratory Manual 
(1982); M DNA Cloning: A Practical Approach," Volumes I and II (D.N. 
Glover ed. 1985); "Oligonucleotide Synthesis" (M.J. Gait ed. 1984); 
"Nucleic Acid Hybridization" [B. D. Hames & S.J. Higgins eds. 
(1985)]; "Transcription and Translation" [B. D. Hames & S.J. Higgins 
eds. (1984)]; "Animal Cell Culture" [R. I. Freshney, ed. (1986)]; 
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"Immobilized Cells And Enzymes" \JRL Press, (1986)]; B. Perbal, "A 
Practical Guide To Molecular Cloning" (1984). 

Therefore, if appearing herein, the following terms shall 
have the definitions set out below. 
5 A "replicon" is any genetic element (e.g., plasmid, 

chromosome, virus) that functions as an autonomous unit of DNA 
replication in vivo; i.e., capable of replication under its own control. 

A "vector" is a replicon, such as plasmid, phage or 

3 cosmid, to which another DNA segment may be attached so as to 

si 

8 10 bring about the replication of the attached segment. 

* A "DNA molecule" refers to the polymeric form of 
Q deoxyribonucleotides (adenine, guanine, thymine, or cytosine) in its 
i either single stranded form, or a double-stranded helix. This term 

1 refers only to the primary and secondary structure of the molecule, 

3 

* 15 and does not limit it to any particular tertiary forms. Thus, this 

term includes double- stranded DNA found, inter alia, in linear DNA 
molecules (e.g., restriction fragments), viruses, plasmids, and 
chromosomes. In discussing the structure herein according to the 
normal convention of giving only the sequence in the 5 1 to 3 1 
20 direction along the nontranscribed strand of DNA (i.e., the strand 
having a sequence homologous to the mRNA). 

1 1 



A DNA "coding sequence" is a double-stranded DNA 
sequence that is transcribed and translated into a polypeptide in 
vivo when placed under the control of appropriate regulatory 
sequences. The boundaries of the coding sequence are determined 
5 by a start codon at the 5 1 (amino) terminus and a translation stop 
codon at the 3' (carboxyl) terminus. A coding sequence can 
include, but is not limited to, prokaryotic sequences, cDNA from 
eukaryotic mRNA, genomic DNA sequences from eukaryotic (e.g., 
mammalian) DNA, and even synthetic DNA sequences. A 

10 polyadenylation signal and transcription termination sequence will 
usually be located 3 ? to the coding sequence. 

Transcriptional and translational control sequences are 
DNA regulatory sequences, such as promoters, enhancers, 
polyadenylation signals, terminators, and the like, that provide for 

15 the expression of a coding sequence in a host cell. 

A "promoter sequence" is a DNA regulatory region 
capable of binding RNA polymerase in a cell and initiating 
transcription of a downstream (3* direction) coding sequence. For 
purposes of defining the present invention, the promoter sequence 

20 is bounded at its 3' terminus by the transcription initiation site and 
extends upstream (5 ! direction) to include the minimum number of 
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bases or elements necessary to initiate transcription at levels 
detectable above background. Within the promoter sequence will 
be found a transcription initiation site, as well as protein binding 
domains (consensus sequences) responsible for the binding of RNA 
polymerase. Eukaryotic promoters often, but not always, contain 
"TATA" boxes and "CAT" boxes. Prokaryotic promoters contain 
Shine-Dalgarno sequences in addition to the -10 and -35 consensus 
sequences. 

An "expression control sequence" is a DNA sequence that 
controls and regulates the transcription and translation of another 
DNA sequence. A coding sequence is "under the control" of 
transcriptional and translational control sequences in a cell when 
RNA polymerase transcribes the coding sequence into mRNA, which 
is then translated into the protein encoded by the coding sequence. 

A "signal sequence" can be included near the coding 
sequence. This sequence encodes a signal peptide, N-terminal to the 
polypeptide, that communicates to the host cell to direct the 
polypeptide to the cell surface or secrete the polypeptide into the 
media, and this signal peptide is clipped off by the host cell before 
the protein leaves the cell. Signal sequences can be found 
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associated with a variety of proteins native to prokaryotes and 
eukaryotes. 

The term "oligonucleotide", as used herein in referring to the 
probe of the present invention, is defined as a molecule comprised 
5 of two or more ribonucleotides, preferably more than three. Its 
exact size will depend upon many factors which, in turn, depend 
upon the ultimate function and use of the oligonucleotide. 

The term "primer" as used herein refers to an 
oligonucleotide, whether occurring naturally as in a purified 

10 restriction digest or produced synthetically. A "primer" is capable 
of acting as a point of initiation of synthesis when placed under 
conditions in which synthesis of a primer extension product, which 
is complementary to a nucleic acid strand, is induced (i.e., in the 
presence of nucleotides and an inducing agent such as a DNA 

15 polymerase and at a suitable temperature and pH). The primer may 
be either single-stranded or double-stranded and must be 
sufficiently long to prime the synthesis of the desired extension 
product in the presence of the inducing agent. The exact length of 
the primer will depend upon many factors, including temperature, 

20 source of primer and use the method. For example, for diagnostic 
applications, depending on the complexity of the target sequence, 



the oligonucleotide primer typically contains 15-25 or more 
nucleotides, although it may contain fewer nucleotides. 

The primers herein are selected to be "substantially" 
complementary to different strands of a particular target DNA 
5 sequence. This means that the primers must be sufficiently 
complementary to hybridize with their respective strands. 
Therefore, the primer sequence need not reflect the exact sequence 
of the template. For example, a non-complementary nucleotide 
O fragment may be attached to the 5' end of the primer, with the 

10 remainder of the primer sequence being complementary to the 
strand. Alternatively, non-complementary bases or longer 

Q 

CS sequences can be interspersed into the primer, provided that the 

.z S£, 

)% primer sequence has sufficient complementarity with the sequence 

or hybridize therewith and thereby form the template for the 
M 15 synthesis of the extension product. 

A cell has been "transformed" by exogenous or 
heterologous DNA when such DNA has been introduced inside the 
cell. The transforming DNA may or may not be integrated 
(covalently linked) into the genome of the cell. In prokaryotes, 
20 yeast, and mammalian cells for example, the transforming DNA may 
be maintained on an episomal element such as a plasmid. With 

15 



respect to eukaryotic cells, a stably transformed cell is one in which 
the transforming DNA has become integrated into a chromosome so 
that it is inherited by daughter cells through chromosome 
replication. This stability is demonstrated by the ability of the 
eukaryotic cell to establish cell lines or clones comprised of a 
population of daughter cells containing the transforming DNA. A 
"clone" is a population of cells derived from a single cell or ancestor 
by mitosis. A "cell line" is a clone of a primary cell that is capable 
of stable growth in vitro for many generations. 

Two DNA sequences are "substantially homologous" 
when at least about 75% (preferably at least about 80%, and most 
preferably at least about 90% or 95%) of the nucleotides match over 
the defined length of the DNA sequences. Sequences that are 
substantially homologous can be identified by comparing the 
sequences using standard software available in sequence data banks, 
or in a Southern hybridization experiment under, for example, 
stringent conditions as defined for that particular system. Defining 
appropriate hybridization conditions is within the skill of the art. 
See, e.g., Maniatis et al., supra; DNA Cloning, Vols. I & II, supra; 
Nucleic Acid Hybridization, supra. 
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A "heterologous' region of the DNA construct is an 
identifiable segment of DNA within a larger DNA molecule that is not 
found in association with the larger molecule in nature. Thus, when 
the heterologous region encodes a mammalian gene, the gene will 
5 usually be flanked by DNA that does not flank the mammalian 
genomic DNA in the genome of the source organism. In another 
example, coding sequence is a construct where the coding sequence 
itself is not found in nature (e.g., a cDNA where the genomic coding 
1 sequence contains introns, or synthetic sequences having codons 

CO 10 different than the native gene). Allelic variations or naturally 
S occurring mutational events do not give rise to a heterologous 

ul region of DNA as defined herein. 

£3 The labels most commonly employed for these studies 

i? Fa 

il are radioactive elements, enzymes, chemicals which fluoresce when 

M< 15 exposed to ultraviolet light, and others. A number of fluorescent 
materials are known and can be utilized as labels. These include, 
for example, fluorescein, rhodamine, auramine, Texas Red, AMCA 
blue and Lucifer Yellow. A particular detecting material is anti- 
rabbit antibody prepared in goats and conjugated with fluorescein 
20 through an isothiocyanate. 
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Proteins can also be labeled with a radioactive element 
or with an enzyme. The radioactive label can be detected by any of 
the currently available counting procedures. The preferred isotope 
may be selected from 3R, "C, 32 P, 35 S, 36 C1, 51Q\ 57 Co, 58 Co , 59 Fe , 
5 90 Y , 1251, I3ii, and i86 Re . 

Enzyme labels are likewise useful, and can be detected by 
any of the presently utilized colorimetric, spectropho tome trie, 
fluorospectrophotometric, amperometric or gasometric techniques. 
The enzyme is conjugated to the selected particle by reaction with 
10 bridging molecules such as carbodiimides, diisocyanates, 
glutaraldehyde and the like. Many enzymes which can be used in 
these procedures are known and can be utilized. The preferred are 
peroxidase, p -glucuronidase, p-D-glucosidase, (3-D-galactosidase, 
urease, glucose oxidase plus peroxidase and alkaline phosphatase. 
15 U.S. Patent Nos. 3,654,090, 3,850,752, and 4,016,043 are referred 
to by way of example for their disclosure of alternate labeling 
material and methods. 

As used herein, the term "host" is meant to include not 
only prokaryotes but also eukaryotes such as yeast, plant and 
20 animal cells. A recombinant DNA molecule or gene which encodes a 
28-kDa immunoreactive protein of Ehrlichia chaff eensis of the 
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present invention can be used to transform a host using any of the 
techniques commonly known to those of ordinary skill in the art. 
Especially preferred is the use of a vector containing coding 
sequences for a gene encoding a 28-kDa immunoreactive protein of 
Ehrlichia chaff eensis of the present invention for purposes of 
prokaryote transformation. 

Prokaryotic hosts may include E. coli, S. tymphimurium, 
Serratia marcescens and Bacillus subtilis. Eukaryotic hosts include 
yeasts such as Pichia pastoris, mammalian cells and insect cells. 

In general, expression vectors containing promoter 
sequences which facilitate the efficient transcription of the inserted 
DNA fragment are used in connection with the host. The expression 
vector typically contains an origin of replication, promoter(s), 
terminator(s), as well as specific genes that are capable of providing 
phenotypic selection in transformed cells. The transformed hosts 
can be fermented and cultured according to means known in the art 
to achieve optimal cell growth. 

By "high stringency" is meant DNA hybridization and 
wash conditions characterized by high temperature and low salt 
concentration, e.g., wash conditions of 65°C at a salt concentration 
of approximately 0.1 x SSC, or the functional equivalent thereof. 

19 



For example, high stringency conditions may include hybridization 
at about 42°C in the presence of about 50% formamide; a first wash 
at about 65°C with about 2 x SSC containing 1% SDS; followed by a 
second wash at about 65°C with about 0.1 x SSC. 
5 By "substantially pure DNA" is meant DNA that is not 

part of a milieu in which the DNA naturally occurs, by virtue of 
separation (partial or total purification) of some or all of the 
molecules of that milieu, or by virtue of alteration of sequences that 
flank the claimed DNA. The term therefore includes, for example, a 

*% 

5 10 recombinant DNA which is incorporated into a vector, into an 
ft autonomously replicating plasmid or virus, or into the genomic DNA 

3 of a prokaryote or eukaryote; or which exists as a separate molecule 

J (e.g., a cDNA or a genomic or cDNA fragment produced by 

l polymerase chain reaction (PCR) or restriction endonuclease 

* 15 digestion) independent of other sequences. It also includes a 

recombinant DNA that is part of a hybrid gene encoding additional 

polypeptide sequence, e.g., a fusion protein. 

The identity between two sequences is a direct function 

of the number of matching or identical positions. When a subunit 
20 position in both of the two sequences is occupied by the same 

monomeric subunit, e.g., if a given position is occupied by an 
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adenine in each of two DNA molecules, then they are identical at 
that position. For example, if 7 positions in a sequence 
10 nucleotides in length are identical to the corresponding 
positions in a second 10-nucleotide sequence, then the two 
5 sequences have 70% sequence identity. The length of comparison 
sequences will generally be at least 50 nucleotides, preferably at 
least 60 nucleotides, more preferably at least 75 nucleotides, and 
most preferably 100 nucleotides. Sequence identity is typically 
measured using sequence analysis software (e.g., Sequence Analysis 
10 Software Package of the Genetics Computer Group, University of 
Wisconsin Biotechnology Center, 1710 University Avenue, Madison, 
WI 53705). 

A "vector" may be defined as a replicable nucleic acid 
construct, e.g., a plasmid or viral nucleic acid. Vectors may be used 

15 to amplify and/or express nucleic acid encoding a 28-kDa 
immunoreactive protein of Ehrlichia chaffeensis. An expression 
vector is a replicable construct in which a nucleic acid sequence 
encoding a polypeptide is operably linked to suitable control 
sequences capable of effecting expression of the polypeptide in a 

20 cell. The need for such control sequences will vary depending upon 
the cell selected and the transformation method chosen. Generally, 



control sequences include a transcriptional promoter and/or 
enhancer, suitable mRNA ribosomal binding sites, and sequences 
which control the termination of transcription and translation. 
Methods, which are well known to those skilled in the art, can be 
5 used to construct expression vectors containing appropriate 
transcriptional and translational control signals. See for example, 
the techniques described in Sambrook et al., 1989, Molecular 
Cloning: A Laboratory Manual (2nd Ed.), Cold Spring Harbor Press, 
N.Y. A gene and its transcription control sequences are defined as 

10 being "operably linked" if the transcription control sequences 
effectively control the transcription of the gene. Vectors of the 
invention include, but are not limited to, plasmid vectors and viral 
vectors. Preferred viral vectors of the invention are those derived 
from retroviruses, adenovirus, adeno-associated virus, SV40 virus, 

15 or herpes viruses. 

By a "substantially pure protein" is meant a protein that 
has been separated from at least some of those components that 
naturally accompany it. Typically, the protein is substantially pure 
when it is at least 60%, by weight, free from the proteins and other 

20 naturally occurring organic molecules with which it is naturally 
associated in vivo. Preferably, the purity of the preparation is at 



least 75%, more preferably at least 90%, and most preferably at 
least 99%, by weight. A protein is substantially free of naturally 
associated components when it is separated from at least some of 
those contaminants that accompany it in its natural state. Thus, a 
5 protein that is chemically synthesized or produced in a cellular 
system different from the cell from which it naturally originates will 
be, by definition, substantially free from its naturally associated 
components. Accordingly, substantially pure proteins include 
eukaryotic proteins synthesized in E. coli, other prokaryotes, or any 

10 other organism in which they do not naturally occur. 

The phrase "pharmaceutically acceptable" refers to 
molecular entities and compositions that do not produce an allergic 
or similar untoward reaction when administered to a human. The 
preparation of an aqueous composition that contains a protein as an 

15 active ingredient is well understood in the art. Typically, such 
compositions are prepared as injectables, either as liquid solutions 
or suspensions; solid forms suitable for solution in, or suspension 
in, liquid prior to injection can also be prepared. The preparation 
can also be emulsified. 

20 A protein may be formulated into a composition in a 

neutral or salt form. Pharmaceutically acceptable salts, include the 



acid addition salts (formed with the free amino groups of the 
protein) and which are formed with inorganic acids such as, for 
example, hydrochloric or phosphoric acids, or such organic acids as 
acetic, oxalic, tartaric, mandelic, and the like. Salts formed with the 
5 free carboxyl groups can also be derived from inorganic bases such 
as, for example, sodium, potassium, ammonium, calcium, or ferric 
hydroxides, and such organic bases as isopropylamine, 
trimethylamine, histidine, procaine and the like. 
n Upon formulation, solutions will be administered in a 

S3 10 manner compatible with the dosage formulation and in such amount 
as is therapeutically effective. The formulations are easily 
administered in a variety of dosage forms such as injectable 
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solutions. 

In 

[7 For parenteral administration in an aqueous solution, for 

M 15 example, the solution should be suitably buffered if necessary and 
the liquid diluent first rendered isotonic with sufficient saline or 
glucose. These particular aqueous solutions are especially suitable 
for intravenous, intramuscular, subcutaneous and intraperitoneal 
administration. In this connection, sterile aqueous media that can 
20 be employed will be known to those of skill in the art in light of the 
present disclosure. For example, one dosage could be dissolved in 1 
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ml of isotonic NaCl solution and either added to lOOOmL of 
hypodermoclysis fluid or injected at the proposed site of infusion, 
(see for example, "Remington's Pharmaceutical Sciences" 15 th 
Edition, pages 1035-1038 and 1570-1580). Some variation in 
5 dosage will necessarily occur depending on the condition of the 
subject being treated. The person responsible for administration 
will, in any event, determine the appropriate dose for the individual 
subject. 

As is well known in the art, a given polypeptide may vary 
10 in its immunogenicity. It is often necessary therefore to couple the 
immunogen (e.g., a polypeptide of the present invention) with a 
carrier. Exemplary and preferred carriers are keyhole limpet 
hemocyanin (KLH) and human serum albumin. Other carriers may 
include a variety of lymphokines and adjuvants such as IL2, IL4, IL8 
15 and others. 

Means for conjugating a polypeptide to a carrier protein 
are well known in the art and include glutaraldehyde, m- 
maleimidobenzoyl-N-hydroxysuccinimide ester, carbo-diimide and 
bis-biazotized benzidine. It is also understood that the peptide may 
20 be conjugated to a protein by genetic engineering techniques that 
are well known in the art. 



As is also well known in the art, immunogenicity to a 
particular immunogen can be enhanced by the use of non-specific 
stimulators of the immune response known as adjuvants. Exemplary 
and preferred adjuvants include complete BCG, Detox, RBI 
5 (Immunochem Research Inc.), ISCOMS and aluminum hydroxide 
adjuvant (Superphos, Biosector). 

As used herein the term "complement" is used to define 
the strand of nucleic acid which will hybridize to the first nucleic 
acid sequence to form a double stranded molecule under stringent 
10 conditions. Stringent conditions are those that allow hybridization 
between two nucleic acid sequences with a high degree of homology, 
but precludes hybridization of random sequences. For example, 
!^ hybridization at low temperature and/or high ionic strength is 

termed low stringency and hybridization at high temperature and/or 
15 low ionic strength is termed high stringency. The temperature and 
ionic strength of a desired stringency are understood to be 
applicable to particular probe lengths, to the length and base 
content of the sequences and to the presence of formamide in the 
hybridization mixture. 
20 As used herein, the term "engineered" or "recombinant" 

cell is intended to refer to a cell into which a recombinant gene, 
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such as a gene encoding an Ehrlichia chaff eensis antigen has been 
introduced. Therefore, engineered cells are distinguishable from 
naturally occurring cells that do not contain a recombinantly 
introduced gene. Engineered cells are thus cells having a gene or 
5 genes introduced through the hand of man. Recombinantly 
introduced genes will either be in the form of a cDNA gene, a copy 
of a genomic gene, or will include genes positioned adjacent to a 
promoter not naturally associated with the particular introduced 
gene. In addition, the recombinant gene may be integrated into the 
10 host genome, or it may be contained in a vector, or in a bacterial 
genome transfected into the host cell. 

The following examples are given for the purpose of 
illustrating various embodiments of the invention and are not meant 
to limit the present invention in any fashion. 

15 

EXAMPLE 1 

Ehrlichia spp 

20 Ehrlichia chaffeensis (Arkansas strain) was obtained 

from Jacqueline Dawson (Centers for Disease Control and 
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Prevention, Atlanta, GA). Ehrlichiae were cultivated in DH82 cells, a 
canine macrophage-like cell line. DH82 cells were harvested with a 
cell scraper when 100% of cells were infected with ehrlichiae. The 
cells were centrifuged at 17,400 X g for 20 min. The pellets were 
5 disrupted twice with a Braun-Sonic 2000 sonicator at 40 W f or 3 0 
sec on ice. Ehrlichia were then purified by using 30% Percoll 
gradient centrifugation (Weiss et al, 1989). 



10 EXAMPLE 2 

PCR flmplific aHnn of the. p7R mnltige.ne locus 

Ehrlichia chaff eensis genomic DNA was prepared by 
using an IsoQuick Nucleic Acid Extraction Kit (ORCA Research Inc., 

15 Bothell, WA) according to the instructions of the manufacturer. The 
unknown sequences of the p28 multigene locus were amplified by 
PCR using the Universal GenomeWalker Kit (Clontech Laboratories, 
Inc., Palo Alto, CA). Briefly, the E. chaffeensis genomic DNA was 
digested respectively with Dra I, EcoR V, Pvu II, Sea I, and Stu I. The 

20 enzymes were chosen because they generated blunt ended DNA 
fragments to ligate with the blunt-end of the adapter. The digested 
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E. chaffeensis genomic DNA fragments were ligated with a 
GenomeWalker Adapter, which had one blunt end and one end with 
5' overhang. The ligation mixture of the adapter and E. chaffeensis 
genomic DNA fragments was used as template for PCR. Initially, the 
p28 gene-specific primer amplified the known DNA sequence and 
extended into the unknown adjacent genomic DNA and the adapter 
5 'overhang, which is complementary to the adapter primer. In the 
subsequent PCR cycles, the target DNA sequences were amplified 
with both the p28 gene-specific primer and the adapter primer. 



EXA M PLE 3 

DNA sequencing 

The PCR products were purified by using a QIAquick PCR 
Purification Kit (QIAGEN Inc., Santa Clarita, CA) and were sequenced 
directly using PCR primers when a single clear band was observed on 
the ethidium-bromide stained agarose gel. If multiple bands 
appeared, the DNA band of interest was excised from the gel, and 
the DNA was extracted from the gel using the Gel Extraction Kit 
(QIAGEN Inc., Santa Clarita, CA). The gel-purified DNA was cloned 
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into the Topo TA cloning vector (Invitrogen, Inc., Carlsbad, CA) 
according to the instructions of the manufacturer. A High Pure 
Plasmid Isolation Kit (Boehringer Mannheim Corp., Indianapolis, IN) 
was used to purify the plasmids. An ABI Prism 377 DNA Sequencer 
5 (Perkin-Elmer Applied Biosystems, Foster City, CA) was used to 
sequence the DNA in the Protein Chemistry Laboratory of the 
University of Texas Medical Branch. 



10 EXAMPLE 4 

Gene analysis 

DNA sequences and deduced amino acid sequences were 
analyzed using DNASTAR software (DNASTAR, Inc., Madison, WI). 

15 The signal sequence of the deduced protein was analyzed by using 
the PSORT program, which predicts the presence of signal sequences 
(McGeoch, 1985, Von Heijne, 1986) and detects potential 
transmembrane domains (Klein, 1985). Phylogenetic analysis was 
performed by the maximum parsimony method of the PAUP 4.0 

20 software (Sunderland Massachusetts: Sinauer Associates, 1998). 
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Bootstrap values for the consensus tree were based on analysis of 
1000 replicates. 



EXAMPLES 

DNA sequence ac cession numbers 

The DNA sequences of the E. chaffeensis p28 genes were 
assigned GenBank accession numbers: AF230642 for the DNA locus 
of the p28-l to p28-13 and AF230643 for the DNA locus of p28-20 
and p28-21. 

EXAMPLE 6 

Reverse transcriptase PCR (RT-PCR) 

Total RNA of E. chaffeensis-infected DH82 cells was 

isolated using RNeasy Total RNA Isolation Kit (Qiagen Inc., Santa 

Clarita, Ca). The p28 gene mRNA (0.5 jxg total RNA) was amplified 

using a Titan One Tube RT-PCR System (Roche Molecular 

Biochemicals, Indianapolis, IN) according to the manufacturer's 

3 1 



instructions. Gene-specific primer pairs used in the RT-PCR reaction 
were listed in Table 1. A negative control that included all reagents 
except reverse transcriptase was included to confirm that genomic 
DNA was not present in the total RNA preparation. The thermal 
5 cycling profile consisted of reverse transcription at 50° C for 30 min, 
amplification for 30 cycles at 94° C for 2 min, 50° C for 1 min, and 
68° C for 1 min, and an elongation step at 68° C for 7 min. 
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TABLE 1 



£bne-specific primers for RT-PCR 

Gene Sequences of forward (f) Product 

and reverse (r) primers length 

(bp) 

p28-10 (f)ACG TGA TAT GGA AAG CAA CAA GT (SEQIDNo. 22) 384 

(r)GCG CCG AAA TAT CCA ACA (SEQ ID No. 23) 

p28-ll (f)GGT CAA ACT TGC CCT AAA CAC A (SEQ ID No. 24) 406 
(r)ACT TCA CCA CCA AAA TAC CCA ATA (SEQ ID No. 25) 

p28-12 (f)CTG CTG GCA TTA GTT ACC C (SEQ ID No. 26) 334 
(r)CAT AGC AGC CAT TGA CC (SEQ ID No. 27) 

p28-13 (f) ATT GAT TGC CTA TTA CTT GAT GGT (SEQ ID No. 28) 333 
(r)AAT GGG GCT GTT GGT TAC TC (SEQ ID No. 29) 

p28-14 (f)TGA AGA CGC AAT AGC AGA TAA GA (SEQIDNo. 30) 269 
(r)TAG CGC AGA TGT GGT TTG AG (SEQ ID No. 3 1) 

p28-15 (f) ACT GTC GCG TTG TAT GGT TTG (SEQIDNo. 32) 371 
(r)ATT AGT GCT GCT TGC TTT ACG A (SEQ ID No. 33) 

p28-17 (f) TGC AAG GTG ACA ATA TTA GTG GTA (SEQ ID No. 34) 367 
(r) GTA TTC CGC TGT TGT CTT GTT G (SEQ ID No. 35) 

p28-18 (f)ACA TTT TGG CGT ATT CTC TGC (SEQ ID No. 36) 312 
(r)TAG CTT TCC CCC ACT GTT ATG (SEQ ID No. 37) 

p28-20 (f)AAC TTA TGG CTT TCT CCT CCT TTC (SEQ ID No. 38) 340 
(r)TTG CCT GAT AAT TCT TTT TCT GAT (SEQ ID No. 39) 

p28-21 (f)ACC AAC TTC CCA ACC AAA ATA ATC (SEQ ID No. 40) 421 
(r) CTG AAG GAG GAG AAA GCC ATA AGT (SEQ ID No. 41) 



EXAMPLE 7 



Southern blotting 

5 The DNA sequences of the p28 multigene locus were 

analyzed for the presence of restriction sites using a Map draw 
program (DNASTAR, Inc., Madison, WI). Ehrlichia chaffeensis 
genomic DNA was digested by restriction endonuclease Cla I. The 
DNA was separated using a 0.8% agarose gel. DNA was blotted onto 

10 nylon membranes by capillary transfer. The probe was DNA- 
amplified from the p28 multigene locus by using PCR and was 
labeled with digoxigenin-ll-dUTP using a DIG DNA Labeling Kit 
(Roche Molecular Biochemicals, Indianapolis, IN). The probe 
corresponded to the nucleotides from 8900 to 10620 of the locus, 

15 which included the 3' end of p28-7, the entire gene of p28-8, the 5' 
end of p28-9, and the intergenic sequences between the three genes. 
DNA hybridization was performed at 42°C overnight in the Eazy 
Hybridization Buffer (Roche Molecular Biochemicals, Indianapolis, 
IN). The DNA probes were detected using the colorimetric reagent 

20 (BCIP/NBT) following the instructions of the manufacturer (Roche 
Molecular Biochemicals, Indianapolis, IN). 
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EXAMPLE 8 



PCR ampl if i cation of the p28 mnlrigene locus 

The sequences of three p28 gene loci were obtained 
5 from GenBank (accessions: AF021338, AF062761, and AF068234) 
(Ohashi et al., 1998b, Reddy et al., 1998, Yu et al., 1999b) and were 
assembled into a single contiguous DNA sequence which contained 
seven p28 genes with the first one incomplete. Gene-specific 
primers to the partial gene (primer la-rl and primer la-r2) and the 

10 DNA sequence downstream of the last p28 gene (primers 28fl and 
28f2) were designed from the contiguous sequence for the initial 
extension of the p28 gene locus of E. chaffeensis. 

The scheme of PCR-amplification of the p28 multigene 
locus is illustrated in Fig. 1, and the sequences of the gene specific 

15 primers were listed in Table 2. A 1.6-kb DNA fragment was 
amplified initially from the 5' end of the locus from a Stu I- 
restriction genomic library by nested PCR using primer la-r2. The 
PCR products were sequenced directly, and a new primer (28r3) was 
designed from the sequence to further extend the 5' end sequence 

20 of the locus. A 4.5-kb DNA fragment (pvu4.5) was amplified from a 
Pvu II-restriction genomic library by using primer 28r3. The 5' end 
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of the DNA locus was further extended with six additional primer 
walks by using primers: pvur32, 28rl2, 28stur, 28rl4, and 28rl5. 
Each primer was designed from the DNA sequences from the 
preceding PCR product. The 3' end of the locus was initially 
5 extended for 1.5-kb by nested PCR using primers 28fl and 28f2. 
The 1.5-kb DNA fragment was directly sequenced and used to design 
a new primer (28f3) to further walk the 3' end of the locus. 

A 2.8-kb DNA fragment (stu2.8) was amplified from a 
q Stu I-restriction genomic library by using primer 28f3. The pvu4.5, 

j °f 10 pvul.8, and stu2.8 DNA fragments were gel-purified and cloned into 

(IMS 

|;g the Topo TA PCR cloning vector. The DNA in the Topo TA vector was 

£8 sequenced initially using the M13 reverse and M13 forward primers 

H and extended by primer walking. The sequence on the 5' end of 

U stu2.8 was not readable following Ml 3 forward and reverse primers, 

a 

H 15 possibly due to the secondary structure. Thus, the recombinant 

Topo TA plasmid containing the stu2.8 DNA was digested with the 

restriction enzyme Kpn I. A 700-bp fragment of DNA was deleted 

from the 5' end of the stu2.8 DNA. The plasmid was ligated again, 

and the insert was sequenced using M13 reverse and M13 forward 
20 primers. The rest of PCR products were sequenced directly. 
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TABLE 2 



Primers for genome walking the E. chaffeensis p28 multigene locus 



Name 


Sequences 


ri uuut i 




length 






(kb) 




ACC AAA GTA TGC AAT GTC AAG TG (SEQ ID No.42) 




la-r2 


CTG CAG ATG TGA CTT TAG GAG ATT C (SEQ ID No.43) 


1.6 


28r3 


TGT ATA TCT TCC AGG GTC TTT GA (SEQ ID No.44) 


4.5 


U V U 1 U A* 


GAC CAT TCT ACC TCA ACC (SEQ ID No.45) 


1.8 




ATA TCC AAT TGC TCC ACT GAA A (SEQ ID No.46) 


1.5 


28rl2 


CTT GAA ATG TAA CAG TAT ATG GAC CTT GAA 


2.2 




(SEQ ID No.47) 






TGT CCT TTT TAA GCC CAA CT fSEO ID No.48) 


1.5 


28rl4 


TTC TGC AGA TTG ATG TGG ATG TTT (SEQ ID No.49) 


4.7 


28rl5 


TGC AGA TTG ATG TGG ATG TTT (SEQ ID No.50) 


1.1 


28fl b 


GTA AAA CAC AAG CCA CCA GTC T (SEQ ID No.5 1) 




28f2 


GGG CAT ATA CCT ACA CCA AAC ACC (SEQ ID No.52) 


1.5 


28f3 


TAA GAG GAT TGG GTA AGG ATA (SEQ ID No.53) 


2.8 



a: la-rl was outside primer forla-r2; b: 28fl was outside primer for 28f2. 



EXAMPLE 9 



r 78 ge.Tie family consists of 21 homologous but dis tinct genes 

The sequences of the DNA fragments were assembled 
5 together by using the Seqman program (DNASTAR, Inc., Madison, 
WI) into a 23-kb segment of DNA. There were 21 homologous p28 
genes in the DNA locus. The genes were designated as p28-l to p28- 
21 according to their positions from the 5' end to the 3' end of the 
□ locus (Fig. 1). Most of the genes were tandemly arranged in one 

'=3 10 direction in the locus, and the last two genes (p28-20 and p28-21) 
were in the complementary strand. The sizes of the genes ranged 

O 

CO from 816 bp to 903 bp while length of the non-coding sequences 

[if between the neighboring genes varied from 10 to 605-bp. The 

U intergenic spaces between p28A and p28-2 and between p28-6 and 

M 15 p28-7 encoded a 150 amino acid protein and a 195 amino acid 
protein, respectively, and the two proteins had no sequence 
similarity to any known proteins. On the 5' end of the locus, there 
is a 1347 nucleotide open reading frame, which was similar to clpX 
gene, a class-Ill heat-shock gene encoding an ATP-dependent 
20 protease. 
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All the P28s were predicted to have a signal sequence. 
The signal sequences of P28-1, P28-7, and P28-8 were predicted to 
be uncleavable. The signal sequences of the rest of the P28s were 
predicted to be cleavable, and the proteins were predicted to be 
5 cleaved from positions varying from position 19 to position 30. The 
predicted molecular sizes of the mature P28s were from 25.8-kDa to 
32.1-kDa. The C-termini and the middle of the proteins were most 
conserved. There were 4 hypervariable regions in the amino acid 
sequences of the P28 proteins (Fig. 2). The first hypervariable 
10 region was immediately after the signal sequence. No proteins had 
identical sequences in the hypervariable regions (Fig. 2). 



EXAMPLE 10 

15 

Phylogenetic relationships of the P28s 

The amino acid sequence identity of the P28s varied 
from 20% to 83% (Fig. 3). In general, the proteins derived from 
adjacent genes had higher identities. The P28s having the highest 
20 amino acid sequence identities were from P28-16 to P28-19, which 
were 68.3 to 82.7% identical to each other. The next group with 
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high sequence identity was from P28-7 to P28-13, which were 47.6 
to 66.9% identical to each other. The sequence identity among the 
rest of the E. chaffeensis P28s were from 19.7 to 45.6%. 

The amino acid sequences of the P28s of E. chaffeensis 
5 were highly homologous to the P28 protein families of E. canis and E 
muris (McBride et al., 1999a, 1999b, Reddy et al., 1998, Yu et al., 
1999a) and the MAP-1 protein family of C. ruminantium (Van Vliet 
et al.,1994, Sulsona et al., 1999). P28-17 of E. chaffeensis was the 
most conserved protein among the Ehrlichia species. The amino 

10 acid sequence of the E. chaffeensis P28-17 was 58% to 60% identical 
to the P28s of E. canis and 78% to 81% identical to the P28s of E 
muris. The P28s of E. chaffeensis also have significant similarity to 
the MSP-4 protein (Oberle and Barbet, 1993), and the MSP-2 protein 
families of A. marginale (Palmer et al., 1994) and the MSP-2 of the 

15 human granulocytotropic ehrlichiosis agent (Ijdo et al., 1998, 
Murphy et al., 1998). 



20 
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F.YAMPTE 11 



p?R gp.nes located in a single locus 

Southern blotting was performed to detect whether all 
the p28 genes were located on a single locus and whether the whole 
locus has been sequenced. Cla I restriction endonuclease was 
predicted to digest the p28 gene locus at three sites generating 526 8 
bp and 17550 bp DNA fragments. Southern blot using a p28 gene 
probe demonstrated a strong band of 17.6-kb and a weak band of 
5.3-kb in the Cla I-digested E. chaff eensis genomic DNA (Fig. 4). 
This result indicated that all the p28 genes were located on two Cla I 
DNA fragments and that all the p28 genes had been sequenced. 
Sequencing a segment of 2.3 kb DNA upstream of the first p28 gene 
and a segment of 2 kb downstream of the last p28 gene did not 
reveal any additional p28 genes. 

FX A MPT ,F 12 

Transcriptional activity of the, p28 mnltipene family 

The transcriptional activity was evaluated by RT-PCR for 
10 P 28 genes including P 28-10, p28-ll, p28-12, p28-13, p28-14, 
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p28-15, p28-17, p28-18, p28-20, and p28-21 (Fig. 5). These genes 
were selected for transcriptional analysis because they represented 
genes tightly clustered together (p28-10 to p28-13), genes with 
larger intergenic spaces (p28-14 to p28-18), or genes in the 
5 complementary strand {p28-20 and p28-21). To ensure the 
specificity of RT-PCR, each primer pair was designed to be specific 
for a single p28 gene only. DNA bands of expected size were 
observed in ethidium-bromide stained agarose gels of the RT-PCR 
products for the following genes: p28-10, p28-ll, p28-12, p28-15, 

10 p28-18, and p28-20. No DNA band was detected in ethidium- 
bromide stained agarose gels of RT-PCR products of the following 
genes: p28-13, p28-14, p28-17, and p28-21. The rest of the p28 
genes were not investigated for their transcription. In the controls, 
no DNA was amplified from any genes by PCR reactions from which 

15 reverse transcriptase was omitted. All the primer pairs produced 
products of the expected size when using E. chaff eensis genomic 
DNA as template (data not shown). 
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EXAMPLE 13 



p?R gp.nes w ere, monocistronic 

Monocistronic mRNA represents a single gene and 
5 polycistronic mRNA codes for several proteins. Two adjacent p28 
genes might be polycistronically transcribed if both genes yield RT- 
PCR products. Two adjacent genes were monocistronically 
transcribed if one gene yielded a RT-PCR product and the other 
yielded no RT-PCR product. From Fig. 5, it was deduced that the 

10 following pairs of genes were not polycistronically transcribed: p28- 
12 and p28-l 3 , p28-14 and p28-l 5 , p28-17 and p28-18, and p28-20 
and p28-21. The detection of p28-10 to p28-12 by RT-PCR indicated 
they might have been transcribed polycistronically. However, a RT- 
PCR experiments using the p28-10 gene forward primer and the p28- 

15 11 gene reverse primer failed to produce any PCR product. 
Furthermore, amplification with the p28-ll gene forward primer 
and the p28-12 gene reverse primer to amplify p28-ll and p28-12 
as a single DNA fragment failed to yield product. However, both 
pairs of primers amplified the corresponding DNA segments. This 

20 data indicated that these genes were monocistronically transcribed. 
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EXAMPLE 14 



The P28s were divergent among the E. chaffeensis isolates 

A p28 gene corresponding to p28-19 of Arkansas strain 
5 was sequenced in four additional E. chaffeensis isolates made 
previously (Yu et al., 1999b). Clustal alignment indicated that none 
of the P28 genes of the Arkansas strain had identical amino acid 
sequence with the single sequenced P28 of the four E. chaffeensis 
isolates. The sequenced P28's from all four isolates were most 
10 similar (85-86%) to the P28-19 protein of Arkansas strain. Thus, 
they were analogs of P28-19 of Arkansas strain. 



Discussion 

The complete sequence of an entire locus of p28 genes 
15 is reported herein for the first time. Complete sequencing of the 
p28 multigene locus in E. chaffeensis in this study will contribute to 
the investigation of the origin of the multigene family and the 
function of the multigenes. Gene families are thought to have arisen 
by duplication of an original ancestral gene, with different members 
20 of the family then diverging as a consequence of mutations during 
evolution. The most conserved p28 gene among the species of 
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Ehrlichia should be the ancestral gene. E. chaff eensis p28-15 to 
p28-19 are the genes most similar to the p28 of E. canis and £ 
muris. Therefore, the p28 genes might have arisen from one of the 
p28-15 to p28-19 genes. The wide presence of the p28/msp-2 
5 multigenes in the Ehrlichia, Anaplasma, and Cowdria indicate that 
these organisms are phylogenetically related. The significant 
sequence identity between the p28 multigene family and the msp-2 
multigene family indicates that the two gene families originated 
from a common ancestor gene. 

10 p28 genes corresponding to the p28-14 to p28-19 were 

sequenced previously and designated as omp-lb to omp-lf and p2 8 
by Ohashi et al. (1998b) and ORF-1 to ORF-5 by Reddy et al(1998). 
An alphabetic letter or a number assigned to each gene attempted to 
indicate the order and position of the genes in the locus. Neither 

15 previously assigned letters nor the numbers truly represent the 
position of the genes in the locus as revealed when it was sequenced 
completely. Thus, the genes were renamed to best represent the 
order of the genes in the complete locus. P28 was used as the name 
of the protein because it accurately describes the molecular mass of 

20 an immunodominant protein which was determined before its gene 
was sequenced (Chen et al., 1994, Yu et ah, 1993) and also because 
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the p28 was used to describe its gene name when the first p28 gene 
was cloned and sequenced (Ohashi et al. ,1998b). 

Six p28 genes were expressed in cell culture under the 
particular conditions of the investigation among the 10 genes 
5 studied. The genes for which transcription were not detected by RT- 
PCR are possibly not silent genes either since all the genes were 
complete genes, i.e., no truncated form of the p28 genes was found. 
They may be expressed under other conditions. These results were 
consistent with previous data, which detected multiple bands from 

10 22-29 kDa with a monoclonal antibody (Yu et al. 5 1993, 1999b). In 
contrast, a previous study detected only a single p28 gene 
transcribed in cell culture (Reddy et al.,1998). PCR primer 
specificity may have contributed to the failure of detection the 
transcription of multiple genes in the previous study. With the 

15 limitation of knowledge of the DNA sequences at that time, although 
primers were designed to attempt to amplify as many p28 genes as 
possible, the primer pair (R72 and R74) from the previous study was 
perfectly matched to only three of the 21 p28 genes (p2 8 -16, -17, 
and -19). The previous study demonstrated that p28-19 (orf-5) was 

20 transcriptionally active and p28-16 and p28-17 were inactive 
transcriptionally. In the results herein. p28-17 was also 
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transcriptionally inactive. The transcriptional activity of p28-16 and 
p28-19 was not analyzed. It was possible to detect transcriptional 
activity in more p28 genes herein because specific primers were 
used for each p28 gene. 
5 The natural cycles of Ehrlichia involve a tick vector and 

mammalian hosts. Mammals are infected with Ehrlichia by the bite 
of infected ticks, and non-infected ticks acquire Ehrlichia by a blood 
meal from infected animals. Ehrlichia are not transovarially 
transmitted from one generation of ticks to the next (Rikihisa, 

10 1991). Therefore, the mammalian hosts are essential for the 
maintenance of Ehrlichia in nature. Carrier animals serve as the 
reservoirs for Ehrlichia organisms (Swift and Thomas, 1983, Zaugg, 
et al. ? 1986). The persistent infection and carrier status indicate that 
Ehrlichia organisms have evolved one or more mechanisms to 

15 circumvent the host immune system. Some bacterial pathogens are 
endowed with sophisticated mechanisms to adapt to a rapidly 
changing microenviroment in the host. One such system is the 
reversible switching of the expression of the array of cell surface 
components exposed to the host defense system. 

20 Homologous recombination of genes in multigene 

families has contributed to the persistent infection of Borrelia 
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hermsii (Schwan and Hinnebusch, 1998) and Neisseria gonorrhoeae 
(Haas and Meyer, 1986). Homologous recombination of the p2 8 
multigenes has been hypothesized (Reddy and Streck, 1999). 
However, no homologous recombination of p28 genes of Ehrlichia 
5 has yet been demonstrated. Homologous recombination was not 
observed in different passages of E. chaffeensis or E. canis, which 
have been passaged for several years. The DNA sequences of p2 8 
genes published by different laboratories are identical despite the 
different passage histories (Ohashi et aL, 1998b, Reddy et aL, 1998, 

10 Yu et aL, 1999b), suggesting a lack of recombination as a 
mechanism of generation of genetic diversity. Moreover, the DNA 
sequences of five p28 genes in a locus of E. canis Jake and 
Oklahoma isolates are identical despite the temporal and geographic 
separation of these isolates in nature. The genetic variation of the 

15 p28 gene among strains of E chaffeensis is very likely caused by 
random mutation over a long period of evolution of the gene rather 
than by homologous recombination. 

The p28 genes may be expressed differentially. Neither 
the E. chaffeensis nor the E. canis p28 multigenes are one 

20 polycistronic gene. Antigenically and structurally distinct msp-2 
genes have been expressed in acute A. marginale rickettsemia in 
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experimentally infected calf (Eid et al.,1996, French et aL, 1999). 
Protein immunoblotting detected 2-4 proteins in cell culture with a 
monoclonal antibody to a P28 of E. chaffeensis (Yu et aL, 
1993,1999b). Although several £. chaffeensis p28 genes are 
5 transcribed in cell culture, a clone of tick-inoculated E. chaffeensis 
may differentially and sequentially express the p28 multigene family 
in vivo to evade the host immune system. Different P28 proteins 
may have similar structure and function for E. chaffeensis, but 
different antigenicity. The hypervariable regions are predicted to 

10 contain antigenic epitopes which are surface exposed (Yu et aL, 
1999b). Thus, the P28s may be essential for immune escape. 

It was demonstrated that only 40% of convalescent sera 
of monocytotropic ehrlichiosis patients had antibodies to a P28-19. 
Patient serum that reacted with the particular P28 of one strain of E 

15 chaffeensis might not react with the protein in another strain in 
which the amino acid sequences of the hypervariable regions differ 
substantially (Chen et aL, 1997, Yu et aL, 1999c). The data suggest 
that the apparent antigenic variability of the P28 may be explained 
in part by differential expression of the p28 multigene family. 

20 
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